Bilingual Terminology Extraction Using Multi-level Termhood

نویسندگان

  • Chengzhi Zhang
  • Dan Wu
چکیده

Purpose: Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual Ontology construction, machine translation and cross-language information retrieval etc. This paper addresses the issues of monolingual terminology extraction and bilingual term alignment based on multi-level termhood. Design/methodology/approach: A method based on multi-level termhood is proposed. The new method computes the termhood of the terminology candidate as well as the sentence that includes the terminology by the comparison of the corpus. Since terminologies and general words usually have differently distribution in the corpus, termhood can also be used to constrain and enhance the performance of term alignment when aligning bilingual terms on the parallel corpus. In this paper, bilingual term alignment based on termhood constraints is presented. Findings: Experiment results show multi-level termhood can get better performance than existing method for terminology extraction. If termhood is used as constrain factor, the performance of bilingual term alignment can be improved. Originality/value: The termhood of the candidate terminology and the sentence that includes the terminology is used to terminology extraction, which is called multi-level termhood. Multi-level termhood is computed by the comparison of the corpus. The experiment results show that the multi-level termhood can get better performance than standard method. Bilingual term alignment method based on termhood constraint is put forward and termhood is used in the task of bilingual terminology extraction. Experiment results show that termhood constraints can improve the performance of terminology alignment to some extent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Termhood-Based Comparability Metrics of Comparable Corpus in Special Domain

Cross-Language Information Retrieval (CLIR) and machine translation (MT) resources, such as dictionaries and parallel corpora, are scarce and hard to come by for special domains. Besides, these resources are just limited to a few languages, such as English, French, and Spanish and so on. So, obtaining comparable corpora automatically for such domains could be an answer to this problem effective...

متن کامل

A Study on Terminology Extraction Based on Classified Corpora

Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood(Kageura,1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In cla...

متن کامل

Terminology-driven Augmentation of Bilingual Terminologies

This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...

متن کامل

Pattern Based Term Extraction Using ACABIT System

In this paper, we proposed pattern based term extraction model for Japanese applying ACABIT system developed for French. Proposed model evaluates termhood using morphological patterns of basic terms and term variants. After extracting term selections, ACABIT system filters non-terms out from the selections based on simple log likely hood evaluation. This approach would be suitable to Japanese t...

متن کامل

Mutual Bilingual Terminology Extraction

This paper describes a novel methodology to perform bilingual terminology extraction, in which automatic alignment is used to improve the performance of terminology extraction for each language. The strengths of monolingual terminology extraction for each language are exploited to improve the performance of terminology extraction in the other language, thanks to the availability of a sentence-l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The Electronic Library

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2012